Stain normalization for automated whole-slide image classification

ABSTRACT

Techniques for stain normalization image processing for digitized biological tissue images are presented. The techniques include obtaining a digitized biological tissue image; applying to at least a portion of the digitized biological tissue image an at least partially computer implemented convolutional neural network trained using a training corpus including a plurality of pairs of images, where each pair of images of the plurality of pairs of images includes a first image restricted to a lightness axis of a color space and a second image restricted to at least one of: a first color axis of the color space and a second color axis of the color space, such that the applying causes an output image to be produced; and providing the output image.

RELATED APPLICATIONS

This application claims priority to, and the benefit of, both U.S.Provisional Patent Application No. 62/904,146 entitled “AutomatedWhole-Slide Image Classification Using Deep Learning” filed Sep. 23,2020, and U.S. Provisional Patent Application No. 62/904,263 entitled“Stain Normalization for Automated Whole-Slide Image Classification”filed Sep. 23, 2020, which are both hereby incorporated by reference intheir entireties.

FIELD

This disclosure relates generally to pathology.

BACKGROUND

Every year in the United States, twelve million skin lesions arebiopsied, with over five million new skin cancer cases diagnosed. Aftera skin tissue specimen is biopsied, the tissue is fixed, embedded,sectioned, and stained with hematoxylin and eosin (H&E) on (one orseveral) glass slides, ultimately to be examined under microscope by adermatologist, general pathologist, or dermatopathologist who provides adiagnosis for each tissue specimen. Owing to the large variety of over500 distinct skin pathologies and the severe consequences of a criticalmisdiagnosis, diagnosis in dermatopathology demands specialized trainingand education. Although inter-observer concordance in dermatopathologyis estimated between 90% and 95%, there are some distinctions thatpresent frequent disagreement among pathologists, such as in the case ofmelanoma vs. benign melanocytic nevi. However, even when diagnosis isaccurate, the process can be made more efficient by reducing theturnaround time for each case and by improving pathologist workloaddistribution. Often, cases get first sent to a generalist—sometimes adermatologist who diagnoses specimens biopsied in their clinicalpractice. Only if the diagnosis is not a straightforward one is it sentto a specialist to diagnose. This can result in a delay of days to thepatient receiving a diagnosis in sometimes-critical cases. The rise inadoption of digital pathology provides an opportunity for the use ofdeep learning-based methods for closing these gaps in diagnosticreliability and efficiency.

In recent years, attempts have been made to use deep neural networks foridentifying diagnostically relevant patterns in radiology and pathologyimages. While some attempts appear worth pursuing, the translation ofsuch methods to digital pathology is non-trivial. Among the reasons forthis is sheer image size; a whole slide image can contain severalgigabytes of image data and billions of pixels. Additionally,non-standardized image appearance (variability in tissue preparation,staining, scanned appearance, presence of artifacts) and the number ofpathologic abnormalities that can be observed present unique barriers todevelopment of deployable deep learning applications in pathology. Forexample, it is known that inter-site variance—in the form of stain andother image properties—can have a strong impact on deep learning models.Nonetheless, deep learning-based methods have recently shown promise insegmentation tasks which can be used to compute features for traditionalclassifiers, and more recently for some classification tasks. However,many focus only on a single diagnostic class to make binaryclassifications whose utility breaks down when there is more than onerelevant pathology of interest. Additionally, many of these methods havefocused on curated datasets consisting of fewer than five pathologieswith little diagnostic and image variability.

The insufficiency of known models developed and tested using smallcurated datasets such as CAMELYON30 has been effectively demonstrated.See Campanella, G. et al., Clinical-grade computational pathology usingweakly supervised deep learning on whole slide images, Nature Medicine 1(2019), hereinafter, “Campanella”. However, while claiming to validateon data free of curation, their dataset features limited capture of notonly biological variability (e.g., ignores commonly occurring prostaticintraepithelial neoplasia and atypical glandular structures) but alsotechnical variability originating from slide preparation and scanningcharacteristics (resulting in exclusion of slides with pen markings,need for retrospective human correction of select results, and poorerperformance on externally-scanned images). In contrast to these deeplearning systems exposed to contrived pathology problems and datasets,human pathologists are trained to recognize hundreds of morphologicalvariants of diseases they are likely to encounter in their careers andmust adapt to variations in tissue preparation and staining protocols.Deep learning algorithms can also be sensitive to image artifacts, inaddition to these variations. Some have attempted to account for theseissues by detecting and pre-screening image artifacts, either byautomatically or manually removing slides with artifacts. Campanella et.al include variability in nonexcluded artifacts which others lack, butstill selectively exclude images with ink markings, which have beenshown to affect predictions of neural networks.

In addition to the problems identified above regarding automatedpathology, problems exist in training and using classifiers due tovariations among whole slide images produced by various labs. Becausethe appearance of a whole slide image varies from lab to lab, scanner toscanner, with the same scanner over time, and even varies based on thebrand of stain used, images acquired from different labs or scanners arenot effectively interpretable by a model which is trained on thevariations of a single lab or image appearance. To the trained humaneye, stain differences may be the most dramatic differences betweendifferent labs' images. Therefore, much of the work in the field so farhas focused on normalizing stain. Typically, stain normalizationattempts to extract the two main components of the image color whichcome from the hematoxylin (H) and eosin (E) stains. There are manymethods of stain normalization, and most of the previous gold standardmethods have not used any form of deep learning. They typically requirea target image, extract the components of that image, and apply atransform computed between those and the components of the image to benormalized. This has several disadvantages:

1. It requires a target or reference image, which is a single examplewhich should represent the “ideal” stain. It is difficult to select animage which has some “ideal” stain properties, and yet exhibits enoughvariation in tissue types to be useful for comparing images.

2. Methods like Vahadane stain normalization are highly sensitive to thetarget image, as well as to any background or atypically-colored regionsincluded in either the target or to-be-normalized image. It is thereforeeasy to get an erroneous/improbable stain normalization result, or onethat is not representative.

3. As a result of the above, similar images, e.g., those from slightlyseparated slices of tissue, will produce inconsistent normalized stainresults, which is very problematic when training a model to classify apathology.

In addition to just changes in the color or stain of the images whencompared between labs, other changes in image appearance between labsmay be present. Some images had more or less noise, differences inbrightness and contrast, hue and saturation, or orientation of thetissue on the slide. While minor changes in these variables might notaffect a human's ability to diagnose a case from the images very much,these factors can be important to a deep learning system's ability tocorrectly identify pathology. There are not currently any methods ofcorrecting for any of these factors in terms of pre-processing images tobe assessed by a deep learning system.

SUMMARY

According to various embodiments, a method of stain normalization imageprocessing for digitized biological tissue images is presented. Themethod includes obtaining a digitized biological tissue image; applyingto at least a portion of the digitized biological tissue image an atleast partially computer implemented convolutional neural networktrained using a training corpus comprising a plurality of pairs ofimages, wherein each pair of images of the plurality of pairs of imagescomprises a first image restricted to a lightness axis of a color spaceand a second image restricted to at least one of: a first color axis ofthe color space and a second color axis of the color space, whereby theapplying causes an output image to be produced; and providing the outputimage.

Various optional features of the above embodiments include thefollowing. The digitized biological tissue image may include a digitizedwhole-slide image, and the applying may include applying to thewhole-slide image. The providing may include providing to a processcomprising an electronic trained classifier, wherein the electronictrained classifier is trained to identify at least one human biologicaltissue pathology. The digitized biological tissue image may include adigitized cutaneous biological tissue image, and wherein the at leastone biological tissue pathology comprise at least one humandermatopathology. Each pair of images of the plurality of pairs ofimages may include a first image restricted to a lightness axis of a Labcolor space and a second image restricted to an axis ‘a’ of the Labcolor space and to an axis ‘b’ of the Lab color space. Each second imagemay be restricted to colors of hematoxylin and eosin. The plurality ofpairs of images may include a particular plurality of pairs of images,where each pair of images of the particular plurality of pairs of imagescomprises a first image that has had noise added to it. The noise mayinclude at least one of hue noise, saturation noise, brightness noise,contrast noise, or intensity noise. The plurality of pairs of images mayinclude a rotated plurality of pairs of images, wherein each pair ofimages of the rotated plurality of pairs of images comprises a firstimage that has been rotated by an amount, and a second image that hasbeen rotated by the amount. The training corpus may consist of pairs ofimages derived from images obtained by a single laboratory.

According to various embodiments, a system for stain normalization imageprocessing for digitized biological tissue images is presented. Thesystem includes at least one electronic processor and at least onepersistent electronic memory communicatively coupled to the at least oneelectronic processor, the at least one persistent memory comprisingcomputer readable instructions that, when executed by the at least oneelectronic processor, configure the at least one electronic processor toperform operations comprising: obtaining a digitized biological tissueimage; applying to at least a portion of the digitized biological tissueimage an at least partially computer implemented convolutional neuralnetwork trained using a training corpus comprising a plurality of pairsof images, wherein each pair of images of the plurality of pairs ofimages comprises a first image restricted to a lightness axis of a colorspace and a second image restricted to at least one of: a first coloraxis of the color space and a second color axis of the color space,whereby the applying causes an output image to be produced; andproviding the output image.

Various optional features of the above embodiments include thefollowing. The digitized biological tissue image may include a digitizedwhole-slide image, and the applying may include applying to thewhole-slide image. The providing may include providing to a processcomprising an electronic trained classifier, wherein the electronictrained classifier is trained to identify at least one human biologicaltissue pathology. The digitized biological tissue image may include adigitized cutaneous biological tissue image, and wherein the at leastone biological tissue pathology comprise at least one humandermatopathology. Each pair of images of the plurality of pairs ofimages may include a first image restricted to a lightness axis of a Labcolor space and a second image restricted to an axis ‘a’ of the Labcolor space and to an axis ‘b’ of the Lab color space. Each second imagemay be restricted to colors of hematoxylin and eosin. The plurality ofpairs of images may include a particular plurality of pairs of images,wherein each pair of images of the particular plurality of pairs ofimages comprises a first image that has had noise added to it. The noisemay include at least one of hue noise, saturation noise, brightnessnoise, contrast noise, or intensity noise. The plurality of pairs ofimages may include a rotated plurality of pairs of images, wherein eachpair of images of the rotated plurality of pairs of images comprises afirst image that has been rotated by an amount, and a second image thathas been rotated by the amount. The training corpus may consist of pairsof images derived from images obtained by a single laboratory.

BRIEF DESCRIPTION OF THE DRAWINGS

Various features of the embodiments can be more fully appreciated, asthe same become better understood with reference to the followingdetailed description of the embodiments when considered in connectionwith the accompanying figures, in which:

FIG. 1 is a schematic diagram of a system for classifying a whole slideimage using a deep learning classifier according to various embodiments;

FIG. 2 depicts receiver operating curves by lab, class, and confidenceaccording to an example reduction to practice;

FIG. 3 depicts a Sankey diagram depicting how ground truth classes mapto the top five most common diagnoses according to various embodiments;

FIG. 4 depicts image feature vectors in two-dimensional t-distributedstochastic neighbor (t-SNE) embedded plots according to an examplereduction to practice;

FIG. 5 depicts execution times per whole slide image, computed in a setof 1,536 whole slide images from three test labs;

FIG. 6 is a flow chart for a method of automated whole slide imageclassification using deep learning according to various embodiments;

FIG. 7 is a flow diagram for a method of determining a thresholdcorresponding to a confidence level for classifications according tovarious embodiments; and

FIG. 8 is a high-level flow diagram of a method for stain normalizationusing deep learning according to various embodiments.

DESCRIPTION OF THE EMBODIMENTS

Reference will now be made in detail to example implementations,illustrated in the accompanying drawings. Wherever possible, the samereference numbers will be used throughout the drawings to refer to thesame or like parts. In the following description, reference is made tothe accompanying drawings that form a part thereof, and in which isshown by way of illustration specific exemplary embodiments in which theinvention may be practiced. These embodiments are described insufficient detail to enable those skilled in the art to practice theinvention and it is to be understood that other embodiments may beutilized and that changes may be made without departing from the scopeof the invention. The following description is, therefore, merelyexemplary.

This description includes two main parts. Part I discloses techniquesfor automated whole slide image classification using deep learning,including techniques for determining an optional threshold correspondingto a confidence level. Part II discloses techniques for stainnormalization. The techniques of Part I may be used with or without thetechniques of Part II, and the techniques of Part II may be used in thecontext of the techniques of Part I or independently.

I. Automated Whole Slide Image Classification Using Deep Learning

A. Introduction

A real-world deep learning pathology system should be demonstrablyrobust to the variations noted by Campanella as described above in theBackground. It should be tested on non-selected specimens, with noexclusions and no pre-screening or post-screening. A comprehensive testset for robustly assessing system performance should contain:

1. Image from multiple labs, with markedly varied stain and imageappearance due to scanning using different models and vendors, anddifferent tissue preparation and staining protocols;

2. Image wholly representative of a diagnostic workload in thesubspecialty (i.e., not excluding pathologic or morphologic variationswhich occur in a sampled time-period);

3. Image with a host of naturally-occurring and human-induced artifacts:scratches, tissue fixation artifacts, air bubbles, dust and dirt,smudges, out-of-focus or blurred regions, scanner-inducedmisregistrations, striping, pen ink or letters on slides, inked tissuemargins, patching errors, noise, color/calibration/light variations,knife-edge artifacts, tissue folds, and lack of tissue present; and

4. (In some instances) images with no visible pathology, or with noconclusive diagnosis, covering the breadth of cases occurring indiagnostic practice

This disclosure presents a pathology deep learning system (PDLS) that iscapable of classifying whole slide images containing H&E-stained andprepped skin biopsy or re-excised tissue into one of fourdiagnostically-relevant classes based on tissue appearance. Someembodiments return a measure of confidence in the model's assessment;this is useful in such classifications because of the wide range ofvariability in the data. A lab-ready system should be able to not onlyreturn accurate predictions for commonly occurring pathologies and imageappearances, but also flag the significant remainder of images whoseunusual features lie outside the range allowing reliable modelprediction.

A reduction to practice was developed on a whole slide image from asingle lab and independently tested on a completely uncurated andunrefined set of 13,537 sequentially accessioned H&E-stained images fromthree additional labs, each using a different scanner and differentstaining and preparation protocol. No images were excluded. To theinventors' knowledge, this test set is the largest in pathology to date.The reduction to practice satisfied all the criteria listed above forreal-world assessment, and is therefore, to the inventors' knowledge,the first truly real-world-validated deep learning system in pathology.

B. Whole Slide Image Classification

Some embodiments provide a computer-implemented system for, and methodof, classifying a tissue specimen. The tissue specimen may be humantissue, such as by way of non-limiting example, a human cutaneous tissuesample. Embodiments may include obtaining a computer readable image ofthe tissues specimen, e.g., on a computer readable medium, over anetwork such as the internet, or from digitizing a whole slide image.Embodiments may segment the image into a first plurality of segments,e.g., using computer vision such as Otsu's thresholding or using aconvolutional neural network. Next, embodiments may optionally performstain normalization and image adaptation on the image, e.g., using CNN-1(106) as described in detail below in reference to FIG. 1. Embodimentsmay then select, from among the first plurality of segments, a secondplurality of segments that include at least one region of interest,e.g., using CNN-2 (108) as described in detail below in reference toFIG. 1. Embodiments may then apply, to the second plurality of segments,an electronic convolutional neural network trained by a training corpuscomprising a set of pluralities of tissue sample image segments, each ofthe pluralities of tissue sample image segments comprising imagesegments from within a same tissue sample image, each of the pluralitiesof tissue sample image segments labeled according to one of a pluralityof primary pathology classes, where the plurality of primary pathologyclasses consist of a plurality of majority primary pathology classes,where the plurality of primary pathology classes collectively comprise amajority of pathologies of a particular tissue type according toprevalence, and a class for tissue sample image segments not in theplurality of majority primary pathology classes, such that aclassification primary pathology class is output. Embodiments mayutilize CNN-3 (111) as described in detail below to in reference to FIG.1 to that end. Embodiments may then provide the classification primarypathology class, e.g., by displaying on a computer monitor or sending amessage such as an email. Example embodiments are described in detailpresently.

FIG. 1 is a schematic diagram of a system 100 for classifying a wholeslide image using a deep learning classifier according to variousembodiments. System 100 takes as input whole slide image 102 andclassifies it using a cascade of three independently-trainedconvolutional neural networks, CNN-1 (106), CNN-2 (108), and CNN-3 (111)as follows. CNN-1 (106) adapts the image appearance to a common featuredomain, accounting for variations in stain and appearance. CNN-2 (108)identifies regions of interest (ROI) for further processing. The finalnetwork, CNN-3 (111), classifies the whole slide image into one of fourclasses defined broadly by their histologic characteristics: basaloid,melanocytic, squamous, or other, as further described below. Though theclassifier operates at the level of an individual whole slide image,some specimens are spread across multiple whole slide images, andtherefore these decisions may be aggregated to the specimen-level.According to some embodiments, CNN-3 (111) is trained such that eachimage result returns a predicted confidence in the accuracy of theoutcome, along with predicted class. This allows discarding predictionsthat are determined by such embodiments as likely to be false.

An example process for classifying a whole slide image with the deeplearning system is described presently with respect to FIG. 1. In brief,an input whole slide image 102 is first divided into tissue patchesaccording to a tiling procedure 104; those patches pass through CNN-1(106) which adapts their stain and appearance to the target domain; theythen pass through CNN-2 (108) which identifies the regions of interest(patches) to pass to CNN-3 (111), which performs a four-wayclassification, and repeats this multiple times to yield multiplepredictions. These predictions are then converted into an outputclassification into one of the four classes. Further processing may beperformed in order to more finely classify the whole slide image withinits classification class. The convolutional neural networks of system100 are discussed in detail presently.

Because system 100 may be trained only on a single lab's data,implementations may first perform image adaptation to adapt imagesreceived from test labs to a domain where the image features areinterpretable by system 100. Without adaptation, unaccounted-forvariations in the images due to staining and scanning protocols canadversely affect the performance of convolutional neural networks. Insystem 100, image adaptation is shown as being performed using CNN-1(106), which takes as input an image tile of whole slide image 102 andoutputs an adapted tile of the same size and shape but with standardizedimage appearance. In the reduction to practice, CNN-1 (106) was trainedusing 300,000 tiles from the Reference Lab, to mimic the average imageappearance from the Reference Lab when given an input tile.

Subsequently, region of interest extraction is performed using CNN-2(108). This convolutional neural network may be trained using expertannotations by a dermatopathologist as the ground truth. It may betrained to segment regions exhibiting abnormal features indicative ofpathology. The model takes input of a single tile and outputs asegmentation map. Tiles may be selected corresponding to the positiveregions of the segmentation map; set all identified tiles of interest, tis passed on to the final stage classifier.

The final whole slide image classification is then performed using CNN-3(111), which predicts a label l for the set of tiles t identified byCNN-2 (108), where:

l ∈ {Basaloid; Squamous; Melanocytic; Others}

The design of target classes 110 may be based on the prevalence of eachclass's constituent pathologies and the presence of visually-similar andhistologically-similar class representative features. Such a prevalencemay be a prevalence in the general population. Specifically, embodimentsmay perform classification of whole slide images into four classes 110:Basaloid, Squamous, Melanocytic, and Others. These four classes 110 maybe defined by the following histological descriptions of their features:

1. Basaloid: Abnormal proliferations of basaloid-oval cells having scantcytoplasm and focal hyperchromasia of nuclei; cells in islands ofvariable size with round, broad-based and angular morphologies;peripheral palisading of nuclei, peritumoral clefting, and a fibromyxoidstroma.

2. Squamous: Squamoid epithelial proliferations ranging from ahyperplastic, papillomatous and thickened spinous layer to focal andfull thickness atypia of the spinous zone as well as invasive strands ofatypical epithelium extending into the dermis at various levels.

3. Melanocytic: Cells of melanocytic origin in the dermis, in symmetricnested and diffuse aggregates and within the intraepidermal compartmentas single cell melanocytes and nests of melanocytes. Nests may bevariable in size, irregularly spaced, and single cell melanocytes may besolitary, confluent, hyperchromatic, pagetoid and with pagetoid spreadinto the epidermis. Cellular atypia can range from none to strikinganaplasia and may be in situ or invasive.

4. Other. Morphologic and histologic patterns that include either theabsence of a specific abnormality or one of a wide variety of otherneoplastic and inflammatory disorders which are both epithelial anddermal in location and etiology, and which are confidently classified asnot belonging to classes 1-3.

These four classes 110 account for more than 200 pathologic entities inthe reduction to practice training set, and their mapping to the mostprevalent pathologic entities in the training set is illustrated belowin FIG. 3.

Based on these classes, CNN-3 (111) performs a four-way classificationinto classes 110, and repeats this multiple (e.g., thirty) times toyield multiple predictions, where each prediction P_(i) may berepresented as a vector (e.g., vector 112) of dimension N_(classes)=4.Each prediction may be performed using a randomly selected subset ofneurons of CNN-3 (111), e.g., 70% of the full set. A mean operation 114is applied class-wise to obtain class mean vector 116, which includesmeans of the sigmoid outputs for each class. A maximum operation 118 isthen applied to class mean vector 116, which identifies the max 120 ofthe class. The max 120 of the mean 116 of sigmoid output is used forboth the prediction and optionally for a confidence score (describedbelow in detail in Section I(C)). If the confidence score surpasses apre-defined threshold, the corresponding class decision 122 is assigned.

Diagnostic labels may be reported at the level of a specimen, which maybe represented by one or several whole slide images. Therefore, thepredictions of system 100 may be aggregated across whole slide images tothe specimen level; this is accomplished by assigning to a givenspecimen the maximum-confidence prediction across all whole slide imagesrepresenting that specimen.

In the reduction to practice introduced at the end of Section I(A), thetraining data for system 100 was developed using H&E-stained whole slideimages from Dermatopathology Laboratory of Central States (DLCS), whichis referred to as the “Reference Lab” herein. This dataset is made up oftwo subsets, the first (3,070 whole slide images) consisting of imagesrepresenting commonly diagnosed pathologic entities, and the second(2,000 whole slide images) consisting of all cases accessioned during adiscrete period of time, representing the typical distribution seen bythe lab. This combined Reference Lab set of 5,070 whole slide images waspartitioned randomly into training (70%), validation (15%), and testing(15%) sets, such that whole slide images from any given specimen are notsplit between sets.

To demonstrate its robustness to variations in scanners, staining, andimage acquisition protocols, the reduction to practice was also testedon 13,537 whole slide images collected from three of the largest-volumedermatopathology labs in the United States (referred to as “Test Labs”).Each Test Lab selected a date range within the past four years (based onslide availability) from which to scan a sequentially accessioned set ofapproximately 5,000 slides. All parameters and stages of the reductionto practice pipeline were held fixed after development on the ReferenceLab, with the exception of CNN-3 (111), whose weights were fine-tunedindependently using the 520-image calibration set of each lab. (Thisprocess is referred to herein as “calibration”.) The calibration set foreach consisted of the first 500 whole slide images supplemented bytwenty additional whole slide images from melanoma specimens. 80% of the520 images were used for fine-tuning, and 20% for lab-specificvalidation of the fine-tuning and image adaptation procedures. Specimensfrom the same patient were not split between fine-tuning, validation andtest sets. Each of the three Test Labs scanned their slides using adifferent scanner vendor and model. After this calibration, allparameters were permanently held fixed, and the system was run only onceon each lab's test set of approximately 4,500 whole slide images (range4451 to 4585) for 13,537 in total.

Results are reported for the test set, consisting of 13,537 whole slideimages from the three test labs which were not used in model training ordevelopment. The reduction to practice effectively classified wholeslide images into the four classes with an overall accuracy of 78%before thresholding on confidence score. Importantly, in specimens whosepredictions exceeded the confidence threshold, the reduction to practiceachieved an accuracy of 83%, 94%, and 98% for Confidence Levels 1, 2 and3, respectively. Further discussion of the performance of the reductionto practice follows immediately below in reference to FIG. 2.

FIG. 2 depicts receiver operating curves 202, 204, 206, 208, 210, 212,214, 216, 281 by lab, class, and confidence for the test set of 13,537images according to the example reduction to practice. ROC curves areshown for basaloid (202, 212), melanocytic (204, 216), squamous (206,218) and other (208, 220) classes, with percentage of specimensclassified for each curve represented by the shading bar at right. Thefour curves (202, 204, 206, 208) represent the respective thresholdedconfidence levels or no confidence threshold (“None”). As confidencelevel increases, a larger number of images do not meet the threshold andare excluded from the analysis, as indicated by the shading. At Levels,1, 2, and 3, the percentage of test specimens exceeding the confidencethreshold was 83%, 46% and 20%, respectively. Area under the curve (AUC)increased with increasing confidence level. Similar results are shownfor Level 1 for the test labs; the four curves in (212, 216, 218, 220)represent each of the three labs. In curve 210, validation set accuracyin the Reference Lab is plotted versus sigmoid confidence score, withdashed lines corresponding to the sigmoid confidence thresholds set (andfixed) at 90% (Level 1), 95% (Level 2), and 98% (Level 3). Curve 210depicts empirical overall accuracy according to confidence threshold.

FIG. 3 depicts a Sankey diagram 300 depicting how ground truth classes302 map to the top five most common diagnoses 306 according to variousembodiments. That is, FIG. 3 shows the mapping of ground truth class 304to the proportion correctly predicted 306 as well as proportionsconfused for each of the other classes or remaining unclassified (atLevel 1) due to lack of a confident prediction or absence of any ROIdetected by CNN-2 (106). FIG. 3 thus depicts the proportion of imagescorrectly classified, along with distribution of misclassifications andunclassified specimens at confidence Level 1 (306). The width of eachbar is proportional to the corresponding number of specimens in thethree-lab test set. Additionally FIG. 3 shows the most commonground-truth diagnoses 302 in each of the four classes 304.

FIG. 4 depicts image feature vectors in two-dimensional t-distributedstochastic neighbor (t-SNE) embedded plots 402, 404, 406, 408, 410, 412for the reduction to practice. To demonstrate that the image adaptationperformed by CNN-1 (106) effectively reduces inter-site differences, theinventors used t-distributed stochastic neighbor embedding (t-SNE) tocompare the feature space computed by CNN-2 (108) with and without firstperforming the image adaptation. In FIG. 4, plot 402 depicts theembedded feature space of CNN-2 (108) without first performing imageadaptation, and plot 404 depicts the embedded feature space from CNN-2(108) when image adaptation is performed first. Each point represents animage patch within a whole-slide image, colored by lab.

The bottom row of plots in FIG. 4 depicts feature embeddings from CNN-3(111), where each point represents a single whole slide image and iscolored according to ground-truth classification. Thus, t-SNE asdepicted in FIG. 4 shows the internal feature representation learned bythe final classifier, CNN-3 (111), in plot 406. All images areclassified at baseline plot 406, where plots 408, 410, 412 showincreasing confidence thresholds (plot 408 for Level 1, plot 410 forLevel 2, and plot 412 for Level 3), with images not meeting thethreshold depicted lighter. The clustering shows strong class separationbetween the four classes, with stronger separation and fewer whole slideimages classified as confidence level increases.

FIG. 5 depicts execution times per whole slide image, computed in a setof 1,536 whole slide images from three test labs. The median percentageof total execution time for each stage of the deep learning system isshown at 502. A boxplot of the execution time required at each stage ofthe pipeline is shown at 504, along with total end-to-end execution timefor all images (506), and excluding images for which no regions ofinterest are detected (508).

Compute time profiling was performed on an AmazonWeb Services EC2 P3.8xlarge instance equipped with 32 core Intel Xeon E5-2686 processors, 244GB RAM, and four 16 GB NVIDIA Tesla v100 GPUs supported by NVLink forpeer-to-peer GPU communication. Compute time was measured on thecalibration sets of each of the test labs, in total 1536 whole slideimages.

In general, execution time for any system to be implemented in a labworkflow should be low enough to not present an additional bottleneck todiagnosis. Therefore, the proposed system was designed to beparallelizable across whole slide images to enhance throughput and meetthe efficiency demands of the real-world system. For the reduction topractice, on a single compute node, the median processing time per slidewas 2.5 minutes, with overall throughput of 40 whole slide images perhour.

The reduction to practice and the above results demonstrate the abilityof a multi-site generalizable pathology deep learning system toaccurately classify the majority of specimens in a typicaldermatopathology lab workflow. Developing a deep-learning-basedclassification which translates across image sets from multiple labs isnon-trivial. Without compensation for image variations, non-biologicaldifferences between data from different labs are more prominent infeature space than biological differences between pathologies. This isdemonstrated by image 402 of FIG. 4, in which the image patches clusteraccording to the lab that prepared and scanned the corresponding slide.When image adaptation is performed prior to computing image features,the images do not appear to strongly cluster by lab (404 of FIG. 4). Thereduction to practice further demonstrates that a pathology deeplearning system trained on the single Reference Lab can be effectivelycalibrated to three additional lab sites. Plots 406, 408, 410, and 412of FIG. 4 show strong class separation between the four classes, andthis class separations strengthens with increasing confidence threshold.Intuitively, low-confidence images cluster at the intersection of thefour classes. Strong class separation is reflected also in the ROCcurves, which show high AUC across classes and labs, as seen in FIG. 2.AUC increases with increased confidence level, demonstrating the utilityof confidence score thresholding as a tunable method for excluding poormodel predictions. Plots 202, 204, 206, and 208 of FIG. 2 showrelatively worse performance in the Squamous and Other classes, which isreflected in FIG. 4 by some overlap between the two classes in featurespace; FIG. 3 also shows some confusion between these two classes, butoverall, demonstrates accurate classification of the majority ofspecimens from each class.

The majority of previous deep learning systems in digital pathology havebeen validated only on a single lab or scanner's images, curateddatasets that ignored a portion of lab volume within a specialty, testedon small and unrepresentative datasets, ineffectively balanced datasets,excluded images with artifacts or selectively reverse image “groundtruth” retrospectively for misclassifications and train patch-based orsegmentation-based models while using traditional computer vision orheuristics to arrive at a whole slide prediction. Such methods do notlend themselves to real-world enabled deep learning that is capable ofoperating independent to the pathologist and prior to pathologistreview. Other models may require some human intervention before they canprovide useful information about a slide, and therefore do not enableimprovements in lab workflow efficiencies. In contrast, embodiments maybe trained on all available slides—images with artifacts, slides withouttissue on them, slides with poor staining or tissue preparation, slidesexhibiting rare pathology and those with little evidence of pathology.

All of this variability in the data itself implicates that embodimentsbe capable of determining when it is not possible to make awell-informed prediction. This is accomplished with a confidence score,which can be thresholded to obtain better system performance as shown inFIG. 2 and described in detail below in Section I(C). Note that thecorrelation between system accuracy and confidence was established apriori using only the Reference Lab validation set (see curve 210 ofFIG. 2) to fix the three confidence thresholds. Fixing thresholds apriori establishes that they are generalizable. Campanella, G. et al.,Clinical-grade computational pathology using weakly supervised deeplearning on whole slide images, Nature medicine 1 (2019) attempts to seta classification threshold that yields optimal performance; however,they perform this thresholding using the sigmoid output of a model, onthe same test set in which they report it yielding 100% sensitivity;therefore they do not demonstrate the generalizability of this tunedparameter. Secondly, as demonstrated by Gal, Y. & Ghahramani, Z.,Dropout as a Bayesian approximation: Representing model uncertainty indeep learning, International conference on machine learning, 1050-1059(2016), a model's predictive probability cannot be interpreted as ameasure of confidence.

Note that all performance measures (accuracy, AUC) are reported at thelevel of a specimen, which may consist of several slides, sincediagnosis is not reported at the slide level in dermatopathology. Allslide-level decisions are aggregated to the specimen level as describedherein; this is particularly useful as not all slides within a specimenwill contain pathology, and therefore an incorrect prediction can bemade if slide-level-reporting is performed. Most known systems have notattempted to solve the problem of aggregating slide-decisions to thespecimen level at which diagnosis is performed.

For an embodiment to operate before pathologist' assessment, the entirepipeline should be able to run in a reasonable time period. The computetime profile shown in FIG. 5 demonstrates that embodiments can classifya whole slide image in under three minutes in the majority of cases,which is on the order of time it takes today's scanners to scan a singleslide. There was considerable variation in this number due to a largeamount of variability in the size of the tissue to be processed.However, it is important to note that this process can be infinitelyparallelized across whole slide images to enhance throughput.

Embodiments have the potential to increase diagnostic efficiency inseveral situations. For example, embodiments can enable dermatologistsand general pathologists to know ahead of time which cases could bepotentially challenging, and automatically leave them to a subspecialtyexpert, avoiding unnecessary delays. Further, it is expected that, as aresult of pre-sorting cases with an embodiment, time-to-diagnosis can beshortened. Additionally, pathologists might choose to prioritize certainclasses that most often contain the critical cases, cases that need tobe reviewed earlier in the day due to requiring more examination oradditional tests or stains ordered.

In sum, the deep learning system presented delivers accurate prediction,regardless of scanner type or lab, and requires fewer than 500 slidesfor calibration to a new site. Some embodiments are capable of assessingwhich of their decisions are viable based on a computed confidencescore, described below in Section I(C) and thereby can filter outdecisions that are unlikely to be correct. Furthermore, theclassification performed by some embodiments enables development ofaccessory machine learning models which narrow down a diagnosis withineach class. This might enable further prioritization of extreme cases,such as those presenting features of melanoma. The techniques presentedherein—for example, deep learning of heterogeneously-composed classesand confidence-based prediction screening—are not limited to applicationin dermatopathology or even pathology, but broadly demonstratepotentially effective strategies for translational application of deeplearning in medical imaging. This confidence based strategy is broadlyapplicable for achieving the low error rates necessary for practical useof machine learning in challenging and nuanced domains of medicaldisciplines.

FIG. 6 is a flow chart for a method 600 of automated whole slide imageclassification using deep learning according to various embodiments.Method 600 may be performed by a computer system that include at leastone electronic processor and electronic persistent memory, coupled tothe at least one electronic processor, that includes instructions thatconfigure the at least one electronic processor to perform the actionsof method 600. For example, method 600 may be implemented using system100 of FIG. 1.

Method 600 accepts an image, such as a whole-slide image, and may outputone or both of a primary pathology classification and a secondarypathology classification. The primary pathology classification may beone of basaloid patterns class, squamous patterns class, melanocyticpatterns class, or other patterns class.

The secondary pathology classification may be a specific diagnosiswithin a respective primary pathology classification. Thus, for aprimary basaloid patterns classification, the secondary classificationmay be one of: nodular basal cell carcinoma, multicentric basal cellcarcinoma, basal cell carcinoma, ulcerative basal cell carcinoma,infiltrative basal cell carcinoma, or remaining basaloids. For a primarymelanocytic patterns classification, the secondary classification may beone of: dysplastic nevus, compound nevus, dermal nevus, lentigo,junctional nevus, malignant melanoma, or remaining melanocytic. For aprimary squamous patterns classification, the secondary classificationmay be one of: squamous cell carcinoma, seborrheic keratosis, verruca,actinic keratosis, lichenoid keratosis, or remaining squamous cellcarcinomas. For a primary other patterns classification, the secondaryclassification may be one of: epidermal inclusion cyst, spongioticdermatitis, scar, fibroepithelial polyp, other dermatitis, or remainingothers.

At 602, method 600 obtains an image, such as a whole slide image. Thewhole slide image itself may contain multiple images. The whole slideimage may be of a tissue specimen.

At 604, method 600 segments the obtained image into a first plurality ofsegments, e.g., using computer vision such as Otsu's thresholding orusing a convolutional neural network.

Subsequent to 604, method 600 may optionally perform image adaptation,such as stain normalization, as described in detail below in Section II.

At 606, method 600 selects a second plurality of segments from among thefirst plurality of segments that includes at least one region ofinterest. A convolutional neural network, such as CNN-1 (108) as shownand described herein in reference to FIG. 1 may be used.

At 608, method 600 applies a trained convolutional neural network toobtain an output primary pathology class, which may be one of the fourclasses shown and described above in reference to FIGS. 1-4. The neuralnetwork may thus be trained by a training corpus comprising a set ofpluralities of tissue sample image segments, each of the pluralities oftissue sample image segments comprising image segments from within thesame tissue sample image, each of the pluralities of tissue sample imagesegments labeled according to one of a plurality of primary pathologyclasses (e.g., basaloid, melanocytic, squamous, or other). The pluralityof primary pathology classes may thus consist of a plurality of majorityprimary pathology classes, where the plurality of primary pathologyclasses collectively include a majority of pathologies of a particulartissue type according to prevalence. The plurality of primary pathologyclasses may further include a class for tissue sample image segments notin the plurality of majority primary pathology classes.

At 610, method 600 provides the output primary pathology class (e.g.,basaloid pattern, melanocytic pattern, squamous pattern, or otherpattern). Method 600 may do so by displaying the primary pathology classon a computer screen, emailing it, delivering it to a computing device,delivering it to a clinical workflow, delivering it to a laboratoryinformation system, or delivering it to a report generation system.

At 612, method 600 applies a trained classifier to at least the secondplurality of segments selected at 606 to obtain a secondary pathologyclass. The trained classifier may be configured to further refine thediagnosis within the primary pathology class. Thus, the system mayinclude a first classifier for further refining a primary basaloidpatterns classification into a secondary classification of one of:nodular basal cell carcinoma, multicentric basal cell carcinoma, basalcell carcinoma, ulcerative basal cell carcinoma, infiltrative basal callcarcinoma, or remaining basaloids. The system may further include asecond classifier for further refining a primary melanocytic patternsclassification into a secondary classification of one of: dysplasticnevus, compound nevus, dermal nevus, lentigo, junctional nevus,malignant melanoma, or remaining melanocytic. The system may furtherinclude a third classifier for further refining a primary squamouspatterns classification into a secondary classification of one of:squamous cell carcinoma, seborrheic keratosis, verruca, actinickeratosis, lichenoid keratosis, or remaining squamous cell carcinomas.The system may further include a fourth classifier for further refininga primary other patterns classification into a secondary classificationof one of: epidermal inclusion cyst, spongiotic dermatitis, scar,fibroepithelial polyp, other dermatitis, or remaining others. Knowntechniques may be used to generate and train these secondaryclassifiers. Thus, the trained classifier of 612 may be trained by atraining corpus that includes a set of secondary pluralities of tissuesample image segments, each set of secondary pluralities comprisingimage segments from within the same tissue sample image, each set ofsecondary pluralities labeled according to some pathology subcategorywithin the classification primary pathology category.

At 614, method 608 provides the secondary pathology class. Method 600may do so by displaying the primary pathology class on a computerscreen, emailing it, delivering it to a computing device, delivering itto a clinical workflow, delivering it to a laboratory informationsystem, or delivering it to a report generation system.

C. Confidence Level Thresholding

The reduction to practice described above was developed entirely using5,070 whole slide images from a single Reference Lab. Since there is alarge amount of variety in both the presentation of skin lesionpathology as well as scanner or preparation-induced abnormalities, themodel may assess a confidence for each decision; thereby,likely-misclassified images can be flagged as such. Embodiments may setconfidence thresholds a priori based only on performance on thevalidation set of the Reference Lab (or its analog in new embodiments),which is independent of the data for which the measures of systemperformance were reported.

In general, Gal, Y. & Ghahramani, Z., Dropout as a Bayesianapproximation: Representing model uncertainty in deep learning,International conference on machine learning, 1050-1059 (2016) suggeststhat the predictive probability obtained from a classifier is a singlepoint estimate, and therefore cannot be reliably interpreted as ameasure of prediction confidence. This paper also proposes a method toreliably measure the uncertainty of a decision made by a classifier.Embodiments may adapt this technique to use it for confidence scoring ofthe decision.

To determine a confidence score for a whole slide image, embodiments mayperform a prediction on the same whole slide image repeatedly (e.g.,using CNN-3, 111) for several times by omitting a random subset ofneurons (e.g., 70%, or more generally any number between 50% and 99%)from the prediction. The subsets may be selected randomly orpseudorandomly. Each repetition results in a prediction made using adifferent subset of feature representations. The reduction to practiceused T=30 repetitions, where each repetition, l, yields a predictionP_(i) vector of sigmoid values of length equal to the number of classes,however any number of repetitions between 10 and 100 times may be used.Each element of P_(i) represents the binary probability, p_(i,c) of thecorresponding whole slide image belonging to class c. The confidencescore s for a given whole slide image may then computed as follows, byway of non-limiting example:

$s = {\max\limits_{c}\left( \frac{\Sigma_{i = 1}^{T}p_{i,c}}{T} \right)}$The class associated with the highest confidence s is the predictedclass for the whole slide image; finally, the specimen prediction isassigned as the maximum-confidence prediction of its constituent wholeslide image predictions. If a specimen's confidence score is below acertain threshold, then the prediction is considered unreliable and thespecimen remains unclassified.

Three threshold values for the confidence score were selected foranalysis of the reduction to practice as shown and described herein inreference to FIGS. 1-5; these were determined during the developmentphase, using only the Reference lab's data, because this confidencethreshold qualifies as a tunable model parameter. Confidence thresholdswere selected such that discarding specimens with sigmoid confidencelower than the threshold yielded a set accuracy in the remainingspecimens of the validation set of the Reference Lab. The three targetaccuracy levels were 90%, 95% and 98% (see FIG. 2); the correspondingsigmoid confidence thresholds of 0.33, 0.76, and 0.99 correspond toconfidence Levels 1, 2, and 3 respectively; these confidence thresholdswere held fixed, and applied without modification to the test sets fromthe three test labs.

Embodiments may determine a confidence level for classifications, suchas the primary pathology class, as follows. According to variousembodiments, during classification, means (or other averages) ofsigmoids for each classification over a plurality (e.g., 30, or moregenerally any number between 10 and 100) reduced-neuron (e.g., 70% ofthe neurons, or more generally any number between 50% and 99%)iterations may be compared, and a maximum may be selected as indicatingthe classification, e.g., into a primary classification class. That is,the class corresponding to the maximum averaged sigmoid may be selectedas the classification class. Some embodiments may further utilize suchsigmoid values for comparisons to threshold values corresponding toconfidence levels. If a sigmoid value for a given classification is atleast a great as such a threshold, then the classification is accuratewith a known level of confidence corresponding to the threshold sigmoidvalue. This process is shown and described in detail presently inreference to FIG. 7.

FIG. 7 is a flow diagram for a method 700 of determining a thresholdcorresponding to a confidence level for classifications according tovarious embodiments. According to various embodiments, selecting athreshold for the maximum averaged sigmoid value that corresponds to aparticular confidence for the classification may be performedempirically, as partially illustrated by receiver operatingcharacteristic curve 212 of FIG. 2, above. Such selecting per method 700may proceed as follows.

At 702, a validation set of classified images for a collection ofspecimens is selected. Each image in the validation set of images mayhave an assigned (e.g., human assigned) classification. The validationset may be for a number of specimens, with multiple images (e.g., 1-5)in the validation set for each specimen. For example, the validation setof images may include 700 images of 500 specimens. The validation setmay be selected to have been produced by the same lab that generated thetraining images during the training phase.

At 704, the maximum averaged sigmoid values for each image in thevalidation set of images is computed. This may be accomplished byapplying an embodiment to each image in the validation set of images toa trained neural network as disclosed herein. Such application producesa classification and a corresponding maximum averaged sigmoid value foreach image.

At 706, identify the maximum averaged sigmoid value for each specimendepicted in the validation set of images by, for example, selecting themaximum such value over the images (e.g., 1-5 images) in the validationset of images that correspond to a given specimen. At this point, eachspecimen with images in the validation set of images has an associatedmaximum averaged sigmoid value and an associated classificationaccuracy, which may be either “1” to indicate correct classification or“0” to indicate incorrect classification.

At 708, plot (or otherwise compare) the classification accuracy of thespecimens represented in the validation set versus a plurality ofhypothetical threshold values. That is, for each hypothetical thresholdvalue, plot (or otherwise obtain) a classification accuracy value (e.g.,a percentage) for only those specimens having a corresponding maximumaveraged sigmoid value that is at least as great as the hypotheticalthreshold value. (An example curve 210 is provided in FIG. 2.)

At 710, select a threshold value from the hypothetical threshold valuesthat has a corresponding accuracy of specimen classification. Forexample, as depicted by curve 210 of FIG. 2, for the reduction topractice, a threshold value of about 0.3 corresponds to a 90% accurateclassification of the specimens in the validation set.

To use the threshold value, compare a maximum averaged sigmoid valueresulting from classifying a novel image to the threshold. If greater,then the image is classified correctly with a confidence level relatedto the validation set classification accuracy for the selected thresholdvalue. For example, for a hypothetical threshold value of 0.3corresponding to a 90% accurate classification of the specimens in thevalidation set, and for a maximum averaged sigmoid value for a novelimage of, say, 0.321, the classification value is greater than thethreshold value.

In practice, in one dataset, when a classification value exceeds athreshold value corresponding to 90% accurate validation set specimenclassification, the accuracy of the novel classification has beendetermined to be 83%.

II. Stain Normalization

A. Introduction

Because the appearance of a whole slide image varies from lab to lab,scanner to scanner, with the same scanner over time, and even variesbased on the brand of stain used, images acquired from different labs orscanners should be adapted before they are interpretable by a modelwhich is trained on the variations of a single lab or image appearance.There are not currently any methods of correcting any of the whole slideimage variation factors as described above in the Background section interms of pre-processing images to be assessed by a deep learning system.Embodiments may be used to solve such problems, as described presently.

B. Stain Normalization and Other Augmentation

Due to the inconsistency of traditional machine-learning-basedapproaches (the state of the art in stain normalization right now), someembodiments utilize deep learning. There are several existing methods ofhandling stain variation using deep learning (e.g., StainGANs), butGenerative Adversarial Networks (GANs) are notoriously difficult totrain.

R. Zhang, P. Isola, and A. A. Efros, “Colorful image colorization,”European Conference on Computer Vision, pp. 649-666, Springer, 2016,hereinafter, “Zhang”, presents a technique for learning the colors ofblack-and-white photographs. Zhang trained a model to learn colors fromblack-and-white photos, using only the lightness channel of a ‘Lab’color space image as input. Zhang's deep learning model was adapted andutilized for an entirely different purpose according to variousembodiments. In particular, some embodiments utilize such an adaptedmodel in the histopathology domain in order to learn to predict thestain, based only on the lightness channel of the image. In furthercontrast to Zhang, some embodiments restrict the range of possiblecolors to the range of those exhibited by the H&E-stains from a singlelab. By training a neural network using data from a single lab (input isthe lightness channel of the patches from the training site's data, andoutputs are quantized ‘a’ and ‘b’ channels of the corresponding imageswhich represent color), some embodiments can learn to predict the lab'sstaining pattern based on the features of the lightness channel. Then,when this model is used for prediction on lab 2, the model takes thelightness channel and predicts a stain for it which is similar to thatof lab 1.

In addition to stain, some embodiments address some of the othercomponents of inter-lab image variation with a model setup similar tocolorization. To do this, some embodiments utilize augmentation duringthe training of the model. By adding noise to the input images,jittering hue, saturation, brightness and contrast, and modifying therotation of the input to the colorization model, such embodiments mayaccount for some of these differences in both tissues themselves as wellas scanner properties. These augmentations are not applied to theoutputs of the colorization model (channels, ‘a’ and ‘b’ which representcolor), with the exception of rotation. By learning to predict theoriginal image, which did not receive augmentations, such a model learnsto adapt or map an input image to an expected image appearance. Due tothis alteration of image properties, this process may be referred to as“adaptation” since it is not a straightforward normalization.

Some embodiments transform images from a new source to have appearancesimilar to images from the original source, such that a deep learningmodel has no drop in accuracy when classifying images from the newsource.

Some embodiments provide a deep-learning-based method of preprocessingimage inputs to a deep learning model by transforming the imagesthemselves into a space that shares the visual characteristics of theoriginal training set. (This is related to domain adaptation, but ratherthan transforming the diagnosis model to work on a new domain, themodel's input itself is transformed into the domain that the modelalready recognizes).

During training, the input to this model is a set of images which havehad their characteristics randomly perturbed, and the output is theoriginal images. During prediction, the input is an image from anotherlab (unperturbed), and the output predicted is what the image would looklike if it were to have come from the original lab with which the modelwas trained. The effect of this model is to enable accurate predictionwith subsequent deep-learning-based classification models which weretrained on only a single type of images (in this case, images from asingle lab and scanner). A description of this process is shown anddescribed presently in reference to FIG. 8.

FIG. 8 is a high-level flow diagram of a method 800 for stainnormalization using deep learning according to various embodiments.Method 800 may be performed by a computer system that include at leastone electronic processor and electronic persistent memory, coupled tothe at least one electronic processor, that includes instructions thatconfigure the at least one electronic processor to perform the actionsof method 800. For example, method 800 may be implemented using system100 of FIG. 1, particularly as CNN-1 (106).

At 802, method 800 obtains a digitized biological tissue image. Theimage may be a whole slide image. The whole slide image itself maycontain multiple images. The whole slide image may be of a tissuespecimen.

At 804, method 800 applies, to at least a portion of the digitizedbiological tissue image, a convolutional neural network trained forstain normalization (and possibly additional image augmentationoperations). The convolutional neural network may by trained using atraining corpus that includes a plurality of pairs of images, where eachpair of images of the plurality of pairs of images includes a firstimage restricted to a lightness axis of a color space and a second imagerestricted to at least one of: a first color axis of the color space anda second color axis of the color space. Each pair of images may includea first image restricted to a lightness axis of a Lab color space and asecond image restricted to an axis ‘a’ of the Lab color space and to anaxis ‘b’ of the Lab color space. Each second image may be restricted tocolors of hematoxylin and eosin. The plurality of pairs of images mayinclude a particular plurality of pairs of images, where each pair ofimages of the particular plurality of pairs of images comprises a firstimage that has had noise (e.g., hue noise, saturation noise, brightnessnoise, contrast noise, or intensity noise) added to it. The plurality ofpairs of images may include a rotated plurality of pairs of images,where each pair of images of the rotated plurality of pairs of imagesincludes a first image that has been rotated by an amount, and a secondimage that has been rotated by the amount. The training corpus mayconsist of pairs of images derived from images obtained by a singlelaboratory. The applying causes an output image to be produced, wherethe output image has been stain normalized and possibly additionallyaugmented.

At 806, method 800 provides the output image. The image may be providedby displaying it on a computer monitor, sending it in an email, sendingit to a computer system, or providing it to a process (e.g., method 600of FIG. 6) that uses an electronic trained classifier (e.g., CNN-3, 111)that is trained to identify at least one human biological tissuepathology such as a human dermatopathology

Thus, some embodiments provide a system for, and a method of, stainnormalization image processing for digitized biological tissue images.Such embodiments may obtain a digitized biological tissue image. Suchembodiments may apply to at least a portion of the digitized biologicaltissue image an at least partially computer implemented convolutionalneural network trained using a training corpus comprising a plurality ofpairs of images, where each pair of images of the plurality of pairs ofimages comprises a first image restricted to a lightness axis of a colorspace and a second image restricted to at least one of: a first coloraxis of the color space and a second color axis of the color space, suchthat the applying causes an output image to be produced. Such embodimentmay provide the output image, e.g., to a system or method of automatedwhole-slide image classification using deep learning such as disclosedin Section I, or by displaying on a computer monitor. More particularly,an embodiment may be used as CNN1 (106) of FIG. 1.

Certain embodiments can be performed using a computer program or set ofprograms. The computer programs can exist in a variety of forms bothactive and inactive. For example, the computer programs can exist assoftware program(s) comprised of program instructions in source code,object code, executable code or other formats; firmware program(s), orhardware description language (HDL) files. Any of the above can beembodied on a transitory or non-transitory computer readable medium,which include storage devices and signals, in compressed or uncompressedform. Exemplary computer readable storage devices include conventionalcomputer system RAM (random access memory), ROM (read-only memory),EPROM (erasable, programmable ROM), EEPROM (electrically erasable,programmable ROM), and magnetic or optical disks or tapes.

While the invention has been described with reference to the exemplaryembodiments thereof, those skilled in the art will be able to makevarious modifications to the described embodiments without departingfrom the true spirit and scope. The terms and descriptions used hereinare set forth by way of illustration only and are not meant aslimitations. In particular, although the method has been described byexamples, the steps of the method can be performed in a different orderthan illustrated or simultaneously. Those skilled in the art willrecognize that these and other variations are possible within the spiritand scope as defined in the following claims and their equivalents.

What is claimed is:
 1. A method of stain normalization image processingfor digitized biological tissue images, the method comprising: obtaininga digitized biological tissue image; applying to at least a portion ofthe digitized biological tissue image an at least partially computerimplemented convolutional neural network trained using a training corpuscomprising a plurality of pairs of images, wherein each pair of imagesof the plurality of pairs of images comprises a first image restrictedto a lightness axis of a color space and a second image restricted to atleast one of: a first color axis of the color space and a second coloraxis of the color space, whereby the applying causes an output image tobe produced; and providing the output image.
 2. The method of claim 1,wherein the digitized biological tissue image comprises a digitizedwhole-slide image, and wherein the applying comprises applying to thewhole-slide image.
 3. The method of claim 1, wherein the providingcomprises providing to a process comprising an electronic trainedclassifier, wherein the electronic trained classifier is trained toidentify at least one human biological tissue pathology.
 4. The methodof claim 3, wherein the digitized biological tissue image comprises adigitized cutaneous biological tissue image, and wherein the at leastone biological tissue pathology comprise at least one humandermatopathology.
 5. The method of claim 1, wherein each pair of imagesof the plurality of pairs of images comprises a first image restrictedto a lightness axis of a Lab color space and a second image restrictedto an axis ‘a’ of the Lab color space and to an axis ‘b’ of the Labcolor space.
 6. The method of claim 1, wherein each second image isrestricted to colors of hematoxylin and eosin.
 7. The method of claim 1,wherein the plurality of pairs of images comprise a particular pluralityof pairs of images, wherein each pair of images of the particularplurality of pairs of images comprises a first image that has had noiseadded to it.
 8. The method of claim 7, wherein the noise comprises atleast one of hue noise, saturation noise, brightness noise, contrastnoise, or intensity noise.
 9. The method of claim 1, wherein theplurality of pairs of images comprise a rotated plurality of pairs ofimages, wherein each pair of images of the rotated plurality of pairs ofimages comprises a first image that has been rotated by an amount, and asecond image that has been rotated by the amount.
 10. The method ofclaim 1, wherein the training corpus consists of pairs of images derivedfrom images obtained by a single laboratory.
 11. A system for stainnormalization image processing for digitized biological tissue images,the system comprising at least one electronic processor and at least onepersistent electronic memory communicatively coupled to the at least oneelectronic processor, the at least one persistent memory comprisingcomputer readable instructions that, when executed by the at least oneelectronic processor, configure the at least one electronic processor toperform operations comprising: obtaining a digitized biological tissueimage; applying to at least a portion of the digitized biological tissueimage an at least partially computer implemented convolutional neuralnetwork trained using a training corpus comprising a plurality of pairsof images, wherein each pair of images of the plurality of pairs ofimages comprises a first image restricted to a lightness axis of a colorspace and a second image restricted to at least one of: a first coloraxis of the color space and a second color axis of the color space,whereby the applying causes an output image to be produced; andproviding the output image.
 12. The system of claim 11, wherein thedigitized biological tissue image comprises a digitized whole-slideimage, and wherein the applying comprises applying to the whole-slideimage.
 13. The system of claim 11, wherein the providing comprisesproviding to a process comprising an electronic trained classifier,wherein the electronic trained classifier is trained to identify atleast one human biological tissue pathology.
 14. The system of claim 13,wherein the digitized biological tissue image comprises a digitizedcutaneous biological tissue image, and wherein the at least onebiological tissue pathology comprise at least one humandermatopathology.
 15. The system of claim 11, wherein each pair ofimages of the plurality of pairs of images comprises a first imagerestricted to a lightness axis of a Lab color space and a second imagerestricted to an axis ‘a’ of the Lab color space and to an axis ‘b’ ofthe Lab color space.
 16. The system of claim 11, wherein each secondimage is restricted to colors of hematoxylin and eosin.
 17. The systemof claim 11, wherein the plurality of pairs of images comprise aparticular plurality of pairs of images, wherein each pair of images ofthe particular plurality of pairs of images comprises a first image thathas had noise added to it.
 18. The system of claim 17, wherein the noisecomprises at least one of hue noise, saturation noise, brightness noise,contrast noise, or intensity noise.
 19. The system of claim 11, whereinthe plurality of pairs of images comprise a rotated plurality of pairsof images, wherein each pair of images of the rotated plurality of pairsof images comprises a first image that has been rotated by an amount,and a second image that has been rotated by the amount.
 20. The systemof claim 11, wherein the training corpus consists of pairs of imagesderived from images obtained by a single laboratory.